Reconstructing Strings from Substrings in Rounds
نویسندگان
چکیده
We establish a variety of combinatorial bounds on the tradeoos inherent in reconstructing strings using few rounds of a given number of substring queries per round. These results lead us to propose a new approach to sequencing by hybridization (SBH), which uses interaction to dramatically reduce the number of oligonucleotides used for de novo sequencing of large DNA fragments, while preserving the parallelism which is the primary advantage of SBH.
منابع مشابه
A BSP/CGM Algorithm for the All-Substrings Longest Common Subsequence Problem
Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y . The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on a coarse-grained multicomputer (BSP/CGM) model with p < p m processors that takes O(mn=p) time an...
متن کاملSequencing by Hybridization in Few Rounds
Sequencing by Hybridization (SBH) is a method for reconstructing an unknown DNA string based on substring queries: Using hybridization experiments, one can determine for each string in a given set of strings, whether the string appears in the target string, and use this information to reconstruct the target string. We study the problem when the queries are performed in rounds, where the queries...
متن کاملPASS-JOIN: A Partition-based Method for Similarity Joins
As an essential operation in data cleaning, the similarity join has attracted considerable attention from the database community. In this paper, we study string similarity joins with edit-distance constraints, which find similar string pairs from two large sets of strings whose edit distance is within a given threshold. Existing algorithms are efficient either for short strings or for long stri...
متن کاملAn Efficient Algorithm for Finding Similar Short Substrings from Large Scale String Data
Finding similar substrings/substructures is a central task in analyzing huge amounts of string data such as genome sequences, web documents, log data, etc. In the sense of complexity theory, the existence of polynomial time algorithms for such problems is usually trivial since the number of substrings is bounded by the square of their lengths. However, straightforward algorithms do not work for...
متن کاملAnalysis of correlation structures in the Synechocystis PCC6803 genome
Transfer of nucleotide strings in the Synechocystis sp. PCC6803 genome is investigated to exhibit periodic and non-periodic correlation structures by using the recurrence plot method and the phase space reconstruction technique. The periodic correlation structures are generated by periodic transfer of several substrings in long periodic or non-periodic nucleotide strings embedded in the coding ...
متن کامل